2  In-class assignment

Author

Sinan Ocaktan

Published

October 21, 2022

2.1 NYC Flights

Here we can take a look at our dataset.

kable(head(df))
tailnum year type manufacturer model engines seats speed engine
N10156 2004 Fixed wing multi engine EMBRAER EMB-145XR 2 55 NA Turbo-fan
N102UW 1998 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 2 182 NA Turbo-fan
N103US 1999 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 2 182 NA Turbo-fan
N104UW 1999 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 2 182 NA Turbo-fan
N10575 2002 Fixed wing multi engine EMBRAER EMB-145LR 2 55 NA Turbo-fan
N105UW 1999 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 2 182 NA Turbo-fan

2.1.1 Part 1

2.1.1.1 Calculation

We can calculate the mean seat number on flights changing by the year and visualize them on a plot. We filter out older than 1985 to get more stable results.

seats <- df %>% group_by(year) %>% dplyr::summarize(Mean = mean(seats),) %>% filter(year > 1985)
seats[order(seats$year),]
# A tibble: 28 × 2
    year  Mean
   <int> <dbl>
 1  1986  185.
 2  1987  181.
 3  1988  190.
 4  1989  163.
 5  1990  179.
 6  1991  181.
 7  1992  194.
 8  1993  194.
 9  1994  174.
10  1995  187.
# … with 18 more rows

2.1.1.2 Visualization

ggplot(data=seats,aes(x=year,y=Mean)) +
  geom_line()

In the early 2000s, mean seat numbers decreased significantly. Note that y-axis does not start from zero

2.1.2 Part 2

2.1.2.1 Calculation

We can

man = df %>% count(manufacturer, year, sort = TRUE) %>% filter(n > 10)
man
# A tibble: 79 × 3
   manufacturer      year     n
   <chr>            <int> <int>
 1 BOEING            2001   142
 2 BOEING            2000   134
 3 BOEING            1999   124
 4 BOEING            1998   103
 5 AIRBUS INDUSTRIE  2001    82
 6 AIRBUS INDUSTRIE  2000    80
 7 BOEING            2004    77
 8 BOMBARDIER INC    2004    72
 9 BOEING            2008    68
10 BOEING            2006    66
# … with 69 more rows

2.1.2.2 Visualization

ggplot(data=man,aes(x=year,y=n,color=manufacturer)) +
  geom_line()
Warning: Removed 1 row(s) containing missing values (geom_path).

We can see the popularity of the manufacturers changing by the year.